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In this work, we study a general downlink linear precoder design problem in a multi-cell heterogenous network 
(HetNet), in which macro/pico base stations (BSs) are densely deployed within each cell. The problem is formulated 
in a very general setting as the users' sum-utility maximization problem, in which each user's utility is directly 
related to its achievable rate. Our formulation includes many practical precoder design problems such as multi-cell 
, coordinated linear precoding, full and partial per-cell coordinated multi-point (ComP) transmission, zero-forcing 

■ precoding and joint BS clustering and beamforming/precoding as special cases. 

in 

The general sum-utility maximization problem is difficult due to its non-convexity as well as the coupling 

' of all users' precoders through matrix-valued multiuser interference. In this paper, we propose to use a novel 

CSJ ■ convex approximation technique to approximate the original problem by a series convex subproblems, each of 

which decomposes across all the cells (or BSs). The convexity of the subproblems allows for efficient computation, 

.J while their decomposability leads to distributed implementation. Our approach is made possible by the identification 

rS \ 

$H , of certain key convexity properties of the sum-utility objective. Moreover, in many important network settings, 

the overall computational complexity can be further reduced by solving, in either an exact or an inexact manner, 
the per-cell subproblems using customized algorithms. Extensive simulation experiments show that the proposed 
framework is quite effective for solving interference management problems in many practical settings. 
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I. Introduction 

Heterogenous network (HetNet) has recently emerged as a promising wireless network architecture capable of 
accommodating the explosive demand for wireless data [1]. In HetNet, the upper tier high-power BSs such as Macro 
BSs provide per-cell interference management as well as blanket coverage, while at the lower tier low-power access 
points such as micro/pico/femto BSs and relays are densely deployed to provide capacity extension (see Fig. [Hi. 
This new paradigm of network design brings the transmitters and receivers closer to each other, thus is able to 
provide high link quality with low transmission power 121, [31 ■ 




Fig. 1. The Multi-Tiered Structure of the HetNet. 

Due to the large number of potential interfering nodes in the network, one of the key challenges in the design of 
effective resource allocation schemes in the HetNet is to properly mitigate both the inter-cell and intra-cell multiuser 
interference. Interference management for the HetNet, or for general interference networks, has been under extensive 
research recently IH, Introducing appropriate coordination among the nodes in the network, either in the 
physical layer or in higher layers, has been shown as an effective means for such purpose Ill-Ill. For the network 
with the nodes equipped with multiple antennas, major approaches for physical layer coordinated interference 
management include coordinated beamforming (CB) and joint processing (JP) Q. The CB approach allows the nodes 
to coordinate in the beamformer/precoder level. By a joint design of the transmit/receive beamformers/precoders of 
all the nodes in the network, excessive interference can be avoided |[9ll- |[T4l . The JP approach, also known as the 
coordinated multi -point (ComP) transmission, optimizes the transceiver structures assuming that the users' data are 
available at all the BSs. For example, in a downlink network, a single virtual BS can be formed if the users' data is 
shared among all transmitting BSs. In this way, transmission scheme designed for single cell MIMO broadcasting 
channels, such as non-linear dirty -paper coding (DPC) ifTSl - fTSl and linear precoding schemes |[T9l - |[22l . can be 
used for coordinated transmission. However, due to the high signaling overhead associated with implementing full 
JP for all the BSs in the network, in the HetNet setting, a combination of these two approaches is usually adopted. 
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For example, JP can be performed in a per-cell basis to cancel intra-cell interference, while the CB is used to 
mitigate inter-cell interference IS, |[20l . 

Regardless of the coordination schemes used, interference management is usually formulated into problems that 
optimize certain system utility functions, which are directly related to the users' individual rates H. These utility 
functions, when chosen properly, can well balance the network spectrum efficiency and the fairness level among 
the users. Unfortunately, it has been recently established that, for a large class of utility functions, the associated 
optimization problems are difficult to solve (except for a few special cases, see |[T4l . ||23l - |[25l ). As a result, many 
practical low-complexity algorithms that compute high-quality solutions to the interference management problems 
have been recently developed. The key to designing practical algorithms for interference management is to recognize 
certain convexity and/or decomposability of the underlying utility maximization problems. Convexity leads to 
efficient computational algorithms, while decomposability is crucial for distributed implementation, especially in 
large-scale networks ll26l . |[27l . 

Reference |[28l is the first to recognize that in a MIMO interfering channel (IC), individual users' achievable 
rate is concave in its own transmit covariance while at the same time convex in all other interfering users' transmit 
covariances. Similar concave-convex properties have later been established in other interference networks such as 
the multi-cell OFDMA networks |[29l . and has since been leveraged heavily to design resource allocation algorithms 
in various network settings 191, ifTTl . |[25l . |[29l - |[34l . The convex-concave property allows one to obtain, for each 
individual user, an approximated version of the sum-rate objective by linearizing its convex part while keeping 
its concave part unchanged. In this way, the users can successively optimize their transmit strategies by solving 
a series of convex subproblems. Different convex approximation approaches that are not based on the convex- 
concave properties are also possible, see e.g., |[T4l . ||35l . However, all the schemes cited above do not decompose 
well across the nodes: for the schemes based on the convex-concave structure, the convexification procedures can 
only be done for one node at each time, thus only a single node can update its transmit strategy in each iteration; 
for the algorithm proposed in ||35l . the convex subproblem is still coupled among all the users. Moreover, they are 
mainly designed for peer-to-peer networks, with each transmitter dedicated to transmit to a single receiver. Hence 
they are not suitable for the HetNet setting where each BS can transmit to multiple users while at the same time 
each users can receive from multiple BSs as well. There are a few recent works that have attempted to address 
these drawbacks |[36l - ll39l . However, there is no theoretical convergence analysis for these algorithms. 

Nevertheless, decomposability structure of interference management problems is highly desirable. When judi- 
ciously exploited, it can lead to efficient distributed implementation. This fact has long been recognized for other 
important large-scale network optimization problems such as the network utility maximization (NUM) problems 
(which are a class of problems with convex, separable objectives and coupled constraints). See |[27l for a recent 
survey on various techniques to achieve decomposability across all the users for the NUM-related problems. 
However, unlike the NUM problems, in the interfering networks the nodes are tightly coupled in a nonlinear 
manner through multi-user interference. As a result, even in relatively simple network setting with a set of single 
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antenna transceiver pairs communicating simultaneously (i.e., the interference channel model), decomposability 
structure is difficult to come by. In this case, a simple way to achieve decomposability is to allow each transceiver 
pairs to optimize completely by their own, while disregarding the interference they generate to other nodes in the 
system P0l - P3l . Despite their simplicity, this type of algorithms suffer from non-convergence and low throughput 
when the multi-user interference becomes high. Recently, a weighted minimum mean square error (WMMSE) 
algorithm is proposed in P4l for general utility optimization problems in interfering broadcast channels (IBC). 
Surprisingly, in each of its steps, computation is completely decoupled among the interfering BSs in network. Such 
decomposition is achieved by establishing a key equivalence relationship between the utility maximization problem 
and a weighted MSE minimization problem. Local optimal solutions of the latter problem can be obtained via 
solving three subproblems altematingly, each of which can be decoupled completely across the users. However, it 
is not clear if the equivalence relationship derived still holds true for general HetNet setting. 

To the best of our knowledge, to this point there is no general approach for decomposing interference-coupled 
utility maximization problems in HetNet, or in general interference networks. In this work, we propose to achieve 
decomposability by means of successive convex approximation. Central to our approach is a key observation that 
reveals certain hidden convexity for a wide range of sum-utility maximization problems. The identified property 
allows us to approximate the original non-convex problem by a series of convex subproblems, each of which is 
completely decoupled among the nodes in the network. Based on different ways in which the convex subproblems 
are solved, two low-complexity algorithms are proposed, each bearing wide applicability in interference management 
problems. 

The rest of the paper is organized as follows. In Section |II1 we outline a general system model for interference 
management in HetNet, and describe its applicability in many important practical problems. In Section|lIIl we present 
a key convexity structure of the considered utility maximization problem, which leads to a general successive convex 
approximation algorithm. In Section |IVl the proposed general algorithm is specialized to different interference 
management scenarios. In Section |Vl a useful extension of the algorithm is developed, and its application is 
discussed. Numerical results are given in Section |VlJ and concluding remarks are provided in Section IVIII 

Notations: For a symmetric matrix X, X ^ signifies that X is positive semi-definite. We use Tr[X], |X|, 
X^, /o(X) and Rank(X) to denote the trace, determinant, Hermitian, spectral radius and the rank of a matrix, 
respectively. The (m, n)-th element of a matrix X is denoted by X[m,n]. We use I„ to denote a n x n identity 
matrix. Moreover, we let M^^^^ and C^^^^ denote the set of real and complex N x M matrices, and use S^, 
S+, S++ to denote the set of x Hermitian, Hermitian positive semi-definite and Hermitian positive definite 
matrices, respectively. Finally, we use the notation 0<a_L6>0to indicate a > 0, 6 > 0, a x 5 = 0. The main 
notations used in this paper are listed in Table J] 
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TABLE I 

A List of Notations 



Symbols 


Description 


K. 


The set of cells 


X 


The set of all users 


Q 


The set of all BSs 


Qk 


The set of BSs in cell k 


Xk 


The set of users in cell k 


M 


The number of transmit antennas per BS 


N 


The number of receive antennas per user 




The number of transmitted data streams for user ik 




The channel between user i^. and BS qi 


■Ik 


The transmit beamformer from BS qu to user ik 




The receive MSB matrix for user ik 



II. System Model and Problem Formulation 

We consider a downlink multi-cell HetNet consists of a set /C = {1, • • • ,K} of cells. Within each cell k there 
is a set of = {1, • • • , Qk} distributed base stations (BS) such as macro/micro/pico BSs which provide service 
to users located in different areas of the cell. Assume that in each cell k, there is low-latency backhaul network 
connecting the set of BSs Qk to a central controller (usually the macro BS), who makes the resource allocation 
decisions for all BSs within the cell. Furthermore, this central entity has access to the data signals of all the users 
in its cell. Let = {1, • • • ,Ik} denote the users associated with cell k. Each of the users ik G Ik is served jointly 
by a subset of BSs in Qk- Let X denote the set of all the users and let Q denote the set of all the BSs, respectively. 
Assume that each BS has M transmit antennas, and each user has N receive antennas. Let H^'^ G £^NxM demote 
the channel matrix between the q-th BS in the £-th cell and the i-th user in the k-th cell. Similarly, we use Hf^ to 
denote the channel matrix between all the BSs in the i-th cell to the user ik, i.e., Hf = |Hf | ^„ G i^NxMQi_ 

Suppose that it is possible to transmit di^ < min{M, A^} parallel data streams to user ik- Let V^^'' G C^^^'^'^ 
denote the transmit precoder that BS qk uses to transmit data G C"'"'' to user ik- Define = {^i'^ }g.gQ. ^ 
(j-A/Q.xd. Y^^- 4 IVf'j. G C*^^'=^*'= and = |V. |. as the collection of all precoders intended for 
user ik, the collection of all the precoder belong to BS qk, and the collection of all precoders in cell k, respectively. 
Let V^{V,,kex. 

Using the above definitions, we can express the transmitted signal of BS qk as well as the combined transmitted 
signal for all the BSs in cell k as: 

ik&k ik&k 

The received signal y^^^ G C^^^ of user ik is 

idK jk^ik e^kjcexe 

" V ' " V ' 

intra-cell interference inter-cell interference 
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where € C is the additive white complex Gaussian noise with distribution CJ\f{0,(Tf I 



N) 



Let G C 'k denote the Unear receiver used by user to decode the intended signal. Then the estimated 
signal for user is: s^. = U^yj^^. The mean square error (MSB) for user can be written as 

E,, ^E[(s,, -s,J(s,;, 



The Minimum MSB (MMSB) receiver minimizes user i^'s MSB, and can be expressed as B31 

where E denotes user i^'s received signal covariance matrix. When the MMSB receiver is used, the 
MMSB matrix ^ is reduced to 

E™ = - V^(Ht)^C-iHtV,, >r 0. (4) 

Clearly we also have 1^^^ - E^"^*^^ t 0. 

Let us assume that Gaussian signaling is used and the interference is treated as noise. Burther assume that all 
the BSs in cell k form a single virtual BS that jointly transmit to user ik € Ik, then Vj^^ can be viewed as the 
virtual precoder for user i^. The achievable rate for user is given by ||46l 



= log 



(5) 



= -log IE™''-' I (6) 

where the last equality is the well-known relationship between the transmission rate and the MMSB matrix (see 
e.g., II22I ). It can be derived using the matrix inversion lemma. We will occasionally use the notations i?4(V), 
Cj^(V) and E™™'^'^(V) to make their dependencies on V explicit. 

Let /ifc(-) : M4. — )■ M be the utility function of user i^'s data rate. We make the following assumptions on the 
function fi^{-): 

A-1) /j^ (x) is a concave non-decreasing function in x for all x > 0; 

A-2) log(|X|)) is convex in X, Bor all I ^ X ^ 0; 

A-3) fik{x) is continuously differentiable (i.e., a smooth function). 
Note that this family of utility functions includes well known utilities such as the weighted sum rate, the geometric 
mean of one plus rate and the harmonic mean rate utility functions (see P4l ). They differ considerably with those 
studied in references BTl - llSTI which, although admit concave representations, are not directly related to individual 
users' rates. 

Let s^''(-) be a penalty term of the precoder V^*". We make the following assumptions on the function s1'°{-): 
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B-1) s?''(V?'') is a convex function. 

B-2) ^ilO^i^) is a function that is continuous, but possibly nonsmooth. 

In this paper, we consider the general system-level sum utilities maximization problem in the following form 

max u(V) (SYSTEM) 
s.t. u(V) = /(V) - s(V) 

keKikeXk 

-(v) = E E E 

keKikeXk qkGQk 

yqk g yg, V gfc e Q, V'^ e V'^^ V A: € /C 

where V^*" and represent the feasible sets for BS q^s transmit precoder and the collection of all precoders in 

cell k, respectively. Let V denote the feasible set for V. The penalty term in the objective, when properly designed, 

can induce certain desired structure in the resulting precoding matrices. 

When the functions /(V) and s(V) as well as the feasible sets {V'^''} and {V'^} are properly specified, the 

general problem dSYSTEMl ) can cover a wide range of transceiver design problems in multicell networks. In the 

following, we list several of those problems that are of wide interests. 

. MIMO IBC/IMAC/IC channels with inter-BS CB El, HOl, HI, El, |I33, HI, El: Each cell k has a 
single BS serving all the users Ik- In this case no penalty term s(V) is assumed in the objective, and the 
constraint set V^*- becomes the same as the constraint set V^, which is given by the following sum-power 
constrained set 



lY'^ : 5] Tr[V,,V,f ] < . (7) 
I ikeXk J 



Multicell MIMO network with intra-cell ComP and inter-cell CB 1531, ISl: The BSs in different cells 
cooperate in the precoder level, while the BSs in the same cell share the user data and perform joint 
transmission. Each BS has a separate power constraint: 



I ikeik ) 



(8) 



This model generalizes those for the linear precoder design in the IBC model (e.g., 11391 . B41 ). as when we 
view all the BSs in a cell A: as a giant virtual BS, we recover the precoder design problem in IBC model, 
except that the sum-power constraint ([7]) is replaced by the set of per group of antenna power constraints (each 
small BS q^ is viewed as a group of antennas of the virtual transmitter k). 

Multicell MIMO network with intra-cell partial ComP and inter-cell CB ||54l-|l56|: As is well known 
from the literature (see e.g., m, Q), performing full ComP in each cell can achieve huge improvement of the 



8 



overall spectrum efficiency, while suffering from high level of overhead in the backhaul network. A practical 
alternative scheme to tradeoff spectrum efficiency with acceptable overhead is to implement a partial ComP 
strategy, in which each user is served by not all, but only a few of BSs in each cell. In this case, the BSs 
in different cells cooperate on the precoder level, while the BSs in the same cell are grouped into different 
(possibly overlapping) clusters with small sizes, within which they fully cooperate for transmission (see Fig. 
|2] for an illustration). In this case, besides precoder design, the cluster membership of the BSs needs to be 
decided. This task can be done jointly with precoder design by properly specifying the penalty term s(V). 
The requirement that each user is served by a few BSs translates to the restriction that for each BS qk S Qk, 
its precoder V'^'' = {^1l}iu£Xk should contain only a few nonzero block components |[54l . To induce such 
block sparsity, the penalty term can take the form of 

C(V^) = 7tl|V^ll2, (9) 

with 7^^^*° > being some constant. See |[54l and the reference therein for motivation of using The constraint 
set for this problem is the same as that in dSjl. 

Single cell MIMO network with intra-cell ZF Ell: All the BSs in the cell (say cell 1) jointly perform ZF 
precoding. Such precoding technique is referred to as block-diagonalization (BD) precoding |fT9l . That is, the 
designed precoders ensure that each user receives the intended transmission free of interference. The penalty 
term s(V) is assumed to be not present. The feasible sets become 

Vi = |vi:H]^Vi,(V,J^(H]j^ = 0,Vji^ii, (10) 

= J V"- J] Tr[V^^^ (V^^^ f] < /"'S V gi G Qi I . (1 1) 

We should note that the ZF scheme is a special case of the linear precoding schemes discussed above, as it 
imposes the additional "zero-forcing" structure on the precoders to be designed. 

Multicell MIMO network with intra-cell ZF and inter-cell CB ||23: In this setting, all the BSs in the same 
cell jointly perform BD-ZF, while the BSs in different cells perform CB. The feasible sets are given as 

^ . H^^Vi,(ViJ^(H)j^ = 0, V ifc / ife, 3k,ik G X^, | (12) 
V-?. = I V'- : J] Tr[V^^^ (V^^^ f] < V G Q J . (13) 

Interfering OFDMA network ||29l, IHl, El, gH: Assuming that each cell k has only a single BS that 
serves a single user (denoted as user k). Suppose there is a total of orthogonal channels that can be used 
by all the cells. Then the traditional power control problem in interfering OFDMA network is a special case 
of the precoder design problem by setting and H^, to be diagonal for all i,k ^ )C, and let M = = N 
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for all k. In this way, user fc's rate becomes 

N / 



|H^[n,n]|2|Vfc[n,n]|2 



(14) 



where |H^[n,n]p represents the channel gain on channel n from BS £ to user k; |Vfc[n,n]p represents the 
transmit power for BS k on channel n. 



y f 
/ I 
/ I 
I I 
I I 



J 
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User 2 



J 





/ User3 /BS4 




Fig. 2. A graphical illustration for the partial ComP with intra-cell BS clustering. In this figure, three overlapping groups are formed, which 
respectively contains BSs (1,2), (1,3) and (3,4). 

Despite the wide applicability of dSYSTEMl ). solving it to its global optimality is often very difficult, even without 
the penalty term s(V). In fact, a set of recent works |[T4l . ll23l . ll57l . ll58l have rigorously established the level 
of difficulties of various forms of problem dSYSTEMl ) using the computational complexity theory |[59l . The main 
message is that except for a very few cases such as MISO min utility maximization problem (with = 1 for all the 
users, which can be equivalently transformed to a convex problem), solving the problem dSYSTEMl ) is generally 
NP-hard. We refer the readers to IH for a summary of these complexity results. The NP-hardness of the problem 
indicates that it is not even possible to find an equivalent convex reformulation for it. Thus the best that one can 
do is to seek efficient low-complexity algorithms that provide approximately optimal solutions. Unfortunately, due 
to the fact that the variables {V^}^^ that belong to different cells are tightly coupled in the objective function 
via multi-user interference, even finding efficient (preferably distributed) approximate algorithm for this family of 
problems can be challenging. 

III. A General Successive Convex Approximation Approach 

In this section, we will present our main approach for computing a high quality solution for the general problem 
dSYSTEMl ). As stated in the previous section, solving problem dSYSTEMl ) directly is difficult due to its non- 
convexity. Our approach is to instead solve a series of convex subproblems, each of which is a local approximation 
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of (ISYSTEMI ). A desirable feature of our approach is that each of the obtained subproblems is not only convex with 
respect to the transmit precoder V, but is completely separable among the precoders of different cells. Consequently, 
the computation of the resource allocation in different stages of the algorithm can be canied out independently in 
a parallel fashion by individual cells. 

A. A Local Approximation for (ISYSTEMI ) 

To approximate the objective function of dSYSTEMl ). we begin by deriving a simple local approximation for 
the individual users' utility function fi^{-) using the convexity assumed in assumption A-2. More specifically, let 
V G V be a feasible solution to problem (ISYSTEMI) . Let E^, = E°J"^'<=(V) denote the MMSE evaluated at V. From 
the relationship of the individual users' rates and their MMSE matrices we can express fi^{-) as a function of 

Using Assumption A-2), we have 

(V) = (E~) = (- log |E™|) 

> /.,(-log|Ej) - ^^L_,„g|g,^|Tr [(E,J-i(E— -E,,,) 



^/i,,(E— ;E,J (15) 

where the inequality is due to the property of the convex function; Ci^ and are two constants that are not related 
to E™™'^'^ or V, and Cj^^ > due to the non-decreasing property of /(•) assumed in A-1). 

Note that the function hi^ (E™™**°; E^) is a locally approximated version of /j^(E™™^'^) at the point E^. In fact, 
the approximation is a global underestimate of /^(E™™**®), and it is tight at the point Ej^. That is, /i4(E™™^'^; Ej^) < 
/^(E— ) for all feasible E— and E^,, and that KiK^K) = UAK)- 

Unfortunately, the above approximation does not seem to simplify the problem. Although the sum of the derived 
approximated functions, X^fcex: S^ex^ ^ik(E™™'^'^; Ej^^), can be viewed as a locally approximated version of /(V), 
and it is indeed convex (in fact linear) with respect to {E™™^°}, it is still non-convex with respect to the system 
precoder V (cf. (0])). Our next step is therefore to further approximate /ij^ (^j^mmsc. ^.^'^ by a convex function of V. 
To this end, we need a key technical lemma that explores some hidden convexity property of the function hi^{-). 



(E,J-i(E~-E,J 



Lemma 1 The function Tr 



(E,J-i(V,J^(Ht)^CriHtV,, 



is jointly convex with respect to the pair of vari- 



ables (V4,CiJ in the feasible region {C^^Q^""^'" 
Proof: Let us define 

/.,(V,„QJ^Tr[(E,J-i(V,J^(Ht)^CriHtV,, 
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Consider the epigraph of li^(yi^, C^), i.e., 



{(V,„C,„t) |/.,(V,„C,J <t} = {(V,„Q„t) |Tr [ErVV,^(Ht )^Cr;HtV,JrV2] < ^ | . 

It suffices to show that the epigraph (Vj^, Cj^. ,t) is a convex set |[60l Chapter 3]. To this end, let us consider the 
following extended set (with Zj^^ ^ being a slack variable): 



.,Q„Zi,,t) I Tr[Z,,] < t, Z,, h V/Vi^(Ht)^C-^HtV,J-^/^ Z,, ^ 



0}. 



(16) 



It is not hard to show that ( Vj^^ , Cj^, , t) is just a projection of the set defined by ( fT6l ). Therefore, if the extended 
set ( Vj^ , C , Z^ , t) is convex, then (Vj^ , , t) is also convex. By applying Schur's complement to Zj^^ ^ 
Eri/Vf (Ht)^C-iH,,V, JrV2^ have 



K'^'^fk^k)" 



y 0. 



(17) 



Substituting ([TtIi into (O yields 



(V,,,C,,,Z,,,i) I Tr[Z,J <i, Z,,hO, 



Ifc Ik^ Ik' 



>- 



which is apparently a convex set whenever C^^, >z 0. 

We remark that a direct proof of this lemma can be obtained by evaluating the second order directional derivative 
of the function However this direct approach is not as concise, and we include the proof in the Appendix lAl 

for completeness. ■ 

The result provided by Lemma [T] allows us to further approximate the function /ij^(E™™'''^; E^). Specifically, 
define Ci^ = Cj^(V) as user i^'s received covariance evaluated at point V, then we have 



> Tr 



(E.J-^(V,J^(Ht)^C-iHtV,, 



dt 



dZ,,(V,,,Q,+i(C., - C.J) 



dt 



t=o 



Tr[(E,J-VrfH-+Trf(E,J-iV«(Ht)«CriH^jV,, -V,j1 + Tr [(E, J-1(V,, - V,,)^(Hfj^CriHf^ V,, 



Tr 



(E,J-iV^(H^)^Cri(C,, - QJC^1h5;V,, 



where the inequality is due to the property of the convex function, and the directional derivatives can be obtained 
similarly as those given in Appendix [A] The above inequality combined with the definition of the MMSE matrix 
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in (lUl yields: 



> flj, + Q, Tr 



(E,J-i(V,J^(H^)^C-iH^V, 1 -Q,Tr[E-i] 



(E,J-^V^(HJ;)«C-^HJ;(V,, - V,J 



(18) 



where in (i) we have used the definition of the MMSE matrix in the last equality, we have defined = 
Cj^^H^^Vjj^, and di^ is a constant that is not dependent on V. Interestingly, Uj^ defined in this way is in fact the 
MMSE receiver for user ij^ when the system precoder is given by V (cf. Q). 

Combining (fTSl ) and (fTSl ). we see that /ij^ (V; V) is in fact a locally approximated version of fi^ (V) that satisfies 
the following two properties for all feasible V and V: 

/i..(V; V) = /,,(V), /i.,(V; V) < /,,(V). (19) 

Let us define /i(V; V) = J^i^eik above results further imply that for all feasible V, V, 

MV;V) = /(V) (20a) 
MV;V)</(V). (20b) 

That is, the function /i(V;V) is a locally tight, universal lower bound for the sum-utility function /(V). A 
direct consequence of this observation is that the function /i(V; V) — s(V) is a universal lower bound for u(V). 
Importantly, h(V; V) is in fact a concave function with respect to the variables V. To see this, we can expand 
/i(V; V) explicitly, and rearrange terms to obtain: 



ikei \ 



Tr 



U^JH^V., +V^(Ht)^U,, -U^(^HtV,,(V,J«(HO^)U., 



quadratic term, only related to 

S,,(E,J-^(U^H*;V,,+Vi^(Hj;)«U,, ) - (V.J^J'^V,, 



(21) 
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where we have defined 

J^^ ^ J]%(H^J^U,,(E,J-iU,^H^, G (22) 

It is now clear that — ^(V; V) is a quadratic convex function with respect to V, hence the concavity of h(V; V). 
Surprisingly, /i(V; V) can be written as a sum of the functions {gi^ (Vj^^ ; V)}jj^gi, each of which is only related to 
a single variable in the set {Vj^j^gj. This interesting observation suggests that our series of approximations not 
only convexify the objective function ti(V), but more importantly decompose the non-linearly coupled objective 
function ii(V) into a sum of functions that decouple across the variables. This property of the sum-utility function 
u(V) is one of the main results of this work, and will be leveraged for designing efficient distributed algorithms. 

B. A General Successive Convex Approximation Algorithm 

The lower bounds developed in the previous subsection are crucial for our design of efficient low-complexity 
algorithms that optimize the original problem (ISYSTEMI ). In this subsection, we will develop the algorithm in its 
most general form. In the sections that follow, we will see how this algorithm can be effectively implemented and 
tailored for special cases of (ISYSTEMI ). 

Our approach is to successively approximate /(V) using /i(V; V) to obtain progressively improved solutions. Let 
us use (t) to denote the iteration index. The proposed algorithm, referred to as the successive convex approximation 
(SCA) algorithm, can be carried out in the following three main steps. 

Step 1: Suppose V(t — 1) is a feasible solution to (ISYSTEMI ). In iteration t, we solve the following convex 
optimization problem to obtain V(t) 

nmx /i(V; V(t - 1)) - s(V) (Lower-Bound) 
V'^ G V^',V keK. 

Step 2: For each user i}^ G X, compute 

= Y,K:^j.mfM){ni)^ + allN (24a) 

U,,(t) = (Q,(t))-^HtV,,(t) (24b) 
E,,(t) = - V,^(t)(Ht)^(Q,(t))-iHtV,,(t) (24c) 

^^.W = ^l_.„E..(.)r (24d) 

Note that when /(V) takes the form of the popular weighted sum-rate, Cjj^(t) is always a constant and step (I24dl ) 
does not need to be performed. 
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Step 3: Compute the updated lower bound function h(V; V(t)) according to (|2TI ). Let t = t + l, and go to Step 

1. 

A graphical illustration of the proposed algorithm is depicted in Fig. [3 It is important to note that efficiently 
solving the subproblem dLower-Boundl ) in Step 1 is the key for the low-complexity implementation of the entire 
algorithm. Although this subproblem is already convex and thus can be solved using general purpose solvers (e.g., 
CVX 16111 ). in practice it is always desirable to develop customized low-complexity algorithms, tailored for problems 
with specific structures. 

In the following, we will show that the proposed SCA algorithm converges to the set of stationary solutions 
of problem dSYSTEMb . We note that the proof below differs slightly from a recent proof |[62l Theorem 1], in 
which the convergence of a similar single-block successive approximation algorithm has been shown, with the 
variables assumed to be real vectors. Nevertheless, the main proof technique is the same as that of [62. Theorem 
1], consequently the proof is included only for completeness. 

To this end, the following definitions are needed: 

• Directional derivative: Let / : V ^ ffi be a function where V is a convex set. The directional derivative of / 
at point a; G V in direction d is defined by 

w/ N A v . rf{x + rd) - fix) 
t\.o r 

• Stationary points of a function: Let / : V — > M where V is a convex set. The point x is a stationary point 
of /(•) if f'd{x) < for all d such that x + d G V. 




Fig. 3. A graphical illustration of the proposed SCA algorithm, assuming s(V) is not present. At V(0), a concave function /i(V; V(0)) 
is used to approximate the original non-convex function /(V). The optimal solution of /i(V;V(0)) is V(l). Then the concave function 
/!,(V; V(l)) is constructed at the point V(l), and optimized to obtain V(2). Continuing this process, a stationary solution of the original 
non-convex problem can be found. 

Theorem 1 Suppose the following conditions hold: 

1) Assumptions A-1)— A-3) and B-l)—B2) are all satisfied; 

2) The convex problem (ILower-Boundl ) can be solved to its global optimality; 
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3) The sets \y^}k£K '^^^d {V'^'^jg^^gQ are all convex, closed and compact. 

Then the SCA algorithm converges to a stationary solution of the problem (ISYSTEMI ) globally. 



Proof: Firstly it is easy to observe that the objective value of the problem (ISYSTEMI ) is monotonically 
nondecreasing, i.e., we have 

u{Y{t + 1)) > h{W{t + 1); V(t)) - s(V(t + 1)) > /i(V(t); V(t)) - s{Y{t)) u(V{t)) (25) 



where step (i) is from (I20bl ): step (ii) is from the fact that V(t + 1) is the solution to the problem (ILower-Boundl ): 
step (iii) is because of ( |20a| ). As both /(V) and s(V) are upper bounded for all V in the feasible set, it follows 
that the sequence {u(V{t))} converges. Let u denote its limit. This result combined with (l25l ) implies that 



lim h{V{t + 1); V(t)) - s{V{t + 1)) = u. 



(26) 



Using a similar argument as in |[62l Lemma 1], and use the fact that /(•) and h{-;Y) satisfy (I20al )- (l20bl ). we 
can show that for any feasible V the directional derivative of /(•) at the point V equals that of ; V) at the point 
V, i.e., 



^.^^ /(V + rD)-/(V) ^ ^.^ /i(V + rD;V)-/i(V) 

— s-0 r r^O r 



(27) 



where D satisfies V + D € V. 

Let {V(tm)}m=i ^ converging subsequence of V(i), and denote its limit as V*. In Step 1 of the algorithm, 
the optimality of V(t) to the problem (ILower-Boundl ) implies that 



/i(V(t„ + l);V{tm)) - s(V(t„ + 1)) > hiV;V{tm)) - s(V),V V e V 
Taking limit for both sides and use (l26l ). we obtain u > /i(V; V*) — s(V),V V. Clearly we have 

/i(V*; V*) - s(V*) = ^(V*) = u. 
This implies that h(V*;Y*) - s(V*) > /i(V; V*) - s(V) for all feasible V, which further implies that 



(28) 



/id(V;V*; 



^d(V) 



v=v* 



< 0, V D + V* G V. 



(29) 



v=v* 



Utilizing (1271 ). we obtain 



^d(V) 



/d(V) 



v=v* 



^d(V) 



v=v* 



< 0, V D + V* e V, 



(30) 



v=v* 



which says that V* is a stationary solution to the problem (ISYSTEMI ). 
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IV. Customized Algorithms for Precoder Design 

In this section, we will customize the general SCA algorithm proposed in the previous section to the precoder 
design problems in different network setting. The main focus is to explore the structure of different problems so 
that the convex subproblem (ILower-Boundl ) can be solved efficiently for each network scenario. 

A. Linear Precoder Design for IBC Model 

First let us consider the IBC model in which in each cell, there is a single BS transmitting. The objective is to 
design linear precoder for the users subject to the sum-power constraint for each BS. As jQfc| = 1 for all k, Yi^ 
denotes the transmit precoder used by the BS in cell k to transmit to user i^. In this case, there is no penalty term 
s(V), and the sets V'' and collapse to a single feasible set given by = J2i^eXk ^^(^ik'^il) — Hence 
the subproblem (ILower-Boundl ) is given by (expanding /i(V; V) using d^kC^ik'^'^)^ (EB) 

r^^r ^ ^ 5^.(V.,;V) (IBC) 
s.t. Tr(V.,V,5 <Pk, ke K. 

ik&Ik 

Clearly both the constraints and the objective of the above problem are separable among the BSs (i.e., separable 
among the set of variables V'^' = {Vj^ j^gi^^, for different k S /C). As a result, we can decompose this problem 
into K independent subproblems, with the k-th subproblem taking the following form 

max /(V^V) (IBC-SUB) 
s.t. /(V^V)= 5..(V.„V) 

ik&Ik 

Y Tr(V,,Vf^) < P,. 

ik€lk 

Let a'"' > denote the Lagrangian multiplier associated with the power constraint. Then the Lagrangian function 
for problem dlBC-SUB] ) can be expressed as (excluding constants that are not related to (A''", V'^)) 

L.(v^A^v)= Y U^^k{Kr'(^"kK^^k+^ZiKf^^k 

ikElk \ I ^ 

(V,J^J^-V,, - AM ^ Tr [V,,(V,:J«] - pA . (31) 
\ikeik I 

The dual function is d(}^;N) = maxv^ ^(V'^, A'^; V). The optimal primal-dual pair ((V'^)*, (A'^)*) should satisfy 
the KKT optimality conditions 

(V'^)* = argmaxLfc(V^(A'^)*;V), (32a) 
< (A^)* ±Pk-Yl Tr[V*^(V*J^] > 0. (32b) 

ifcGXfc 
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For fixed A > 0, the solution for the unconstrained problem maxv'- ^(V , A ; V), denoted as {V*^(A 
can be expressed as 



V* fA'^) = fj'^ + A^,)" Q,(Ht)^U.,(E,J-i 



J]%(H^J^U,^.(E,J-1uUh^, + XHm Q,(Ht)^U,,(E,J-\ V G 1^. (33) 
d^j) J 

To find the optimal multiplier (A'^)* that satisfies the complementarity condition ( |32b| ). we utilize a general result 
on penalty method for optimization, e.g., |[63l Section 12.1, Lemma 1]. This result asserts that for the solution 
{V*^ {X'')}i^eXk' the penalized term Y^i^eik ^^^^t ('^'')(^ik i^''))^] ^e monotonically decreasing with respect 
to A'^. Such monotonicity result suggests that we can find the optimal multiplier that satisfies (I32bl l by a simple 
bisection search procedure. 

The algorithm discussed in this subsection is summarized in Table [111 The small constant 5 > in Step S4) is 
used to specify the stopping criterion. 

TABLE II 

The Proposed Algorithm for i ISYSTEMI i in IBC setting 



51) : Initialization Obtain a feasible solution Vij.(0) for all € I 

52) : For each BS k, compute C,Jt), E^Ji), U,Jt) and a^it), 

according to (I24ab -( l24d] i. for all ik 6 

53) : For each BS k, compute the precoders V'°(t) by 

xc.,(i)(H^.)^U,,(t)(E,,(i))-\ y^k^Ik 
where (A'^)* is computed by a bisection procedure 

54) Until some stopping criterion criterion criterion is met 



(A^-)*Im)" 



Remark 1 (The bisection procedure): The computation for each bisection step for finding the optimal multiplier 
(A^)* can be carried out as follows B4l : 1) Perform an eigen-decomposition ^(•^j-|Cjj(H^J^Uf^(EjJ~^U|^H^^ = 
Xfc^fcX^, where is a diagonal matrix; 2) For a given A^ > 0, and for each € 1^, compute V*^(A'^) = 
Xfc($^^ + '^'^I)^fc^Cjfc(Hf^)^tjjj^(Ejj^)~^. In this way, there is no need to compute the matrix inversion in each 
step of the bisection. 

Remark 2 (Relationship with WMMSE algorithm): The algorithm listed in Table HI] recovers the weighted-MMSE 
(WMMSE) algorithm proposed recently in B4l for utility maximization problem in IBC networks Q. The original 
WMMSE algorithm is derived base on ceratin equivalence relationship between the sum utility optimization problem 
and a weighted MMSE minimization problem (see B4l Section II-A]), while in this paper we arrive at this algorithm 

'Although there are several subtle differences between these two algorithms. For example, proving convergence of the WMMSE algorithm 
requires that /i^ (— log \X\) is strictly convex on its argument, and that the subproblem l lIBCt admits a unique solution. There are no such 
requirements for the SCA-based algorithm. 
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by specializing the SCA algorithm to the IBC setting. Thus the SCA algorithm is more general and includes the 
WMMSE algorithm as a special case. 

Remark 3 (Precoder Design for HetNet with both intra-cell and inter-cell CB): The algorithm in Table HIl is also 
applicable for the linear precoder design problem in HetNet with both intra-cell and inter-cell CB. In this case, the 
BSs within each cell only cooperate in the precoder level, and each user is served by a single BS. In this case, the 
entire network can be viewed as an IBC with K x \ Qk\ BSs, thus the algorithm in Table HIl can be directly appHed. 

B. IBC Model Intra-cell ZF and Inter-cell CB Design 

We then consider a similar IBC model as in the previous subsection, but with each BS employing a ZF precoder 
to cancel the intra-cell interference among the users. We assume that certain user-selection within each cell has 
already been performed, which ensures that the per-cell zero forcing constraint Hj^ ^Vj^^ (Vj^^)^Hj^ ^ = 0, V jk 7^ 
^fc, jk £ is always feasible (one easily checkable condition that guarantees feasibility is x N < M, see 
mi, II20I ). Note that in this network setting, although the intra-cell interference is canceled by the use of ZF 
precoder, the inter-cell interference is still present. Hence the original problem dSYSTEMl ) is still difficult to solve. 
The subproblem (ILower-Boundb can be specialized to have the following form 

E E 5..(V.,;V) (IBC-ZFl) 
s.t. [V^,(ViJ^] < Pfc, kelC (34a) 

H,.,fcV,, (Vi, fuf^^k = 0, V jk + ik, jk eik,keJC. (34b) 
To remove the ZF constraints (I34bl ). let L = N{\Xk\ — 1)> and define a set of concatenated channel matrices 

Qi. = {^jJj.eXk\{ik} e C^"*', V ife G Ik. (35) 

Let = Lj^Sl^Rj^ denote the singular value decomposition of Qj^, where Lj^ and are two unitary matrices, 
with Li^ e C^'^^ and Ki^ G C^^^^, and Sj^ being an L x L diagonal matrix. Let Pj^ = (I - Rj^Rj^) denote a 
projection matrix to the space orthogonal to that spanned by Hi^. Let Pj^. = Rj^ R^, where R^ € ^J\/x(a/-L) 
composed of the orthogonal basis that satisfies Rj^Rj^ = and Rf^R^ = I. Then HTl] Lemma 3.1] asserts 1^ that 
the optimal solution of problem dlBC-ZFll ) must be of the form: = Ki^Wi^, where Wj^ € C^^-'"^)^'^-* . 

^Note that Lemma 3.1 in 1211 still applies in our setting as the property derived in that lemma is not related to the form of the objective 
function. It is only related to the zero-forcing constraints ( I34bt . 
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Utilizing this structure of the optimal solution, the problem dlBC-ZFll can be equivalently written by 



max ffi,(Wi,;V) 



(IBC-ZF2) 



s.t. Y Tr(R,,W,,(R,,W,J^) < P,, k G /C, 



where the function gi^{-) is the same as the original objective g-i^ (Vi^ ; V), except that Vj^ is replaced by Rj^ Wj^^ : 



5«fc(W,,;V) = a,, +Tr 



d,, (E,, ) - M Uf^ HJ; R,, W,, + (R,, W, (H^; U, 



(37) 



Again both the constraints and the objective of this problem are separable among the BSs (i.e., the set of variables 
V^ik}ik€Tk)' ^i^^ we can further decompose this problem into K independent subproblems of the form 



max ^''(W^V) 



(IBC-ZF-SUB) 



s.t. 5'(W^V)= Y 5u(W,,;V) 

ik&Ik 

Y Tr(R,,W,,(R,,W,J^)<Pfc, 

ik eifc 

Let us use to denote the Lagrangian multiplier associated with the power constraint. Then by using similar 
steps that lead to (l33l) . we can show that the optimal solution for problem (lIBC-ZF-SUBb is of the form 

W*, = I 5]%(H^,R,J^U,^.(%J-iuf H^,R,, + \IIm-l Q,Rf^(Ht)^U,,(E,J-i, V G X,.. (39) 

) 

where A^. can be computed by a bisection method. 

The algorithm discussed in this section is summarized in Table |llll 

TABLE III 

The proposed algorithm for i lSYSTEMt in IBC setting with intra-cell ZF and inter-cell CB 



51) : Initialization 

SI a): For each BS k, compute: 

Q,, = L,,S,,R,„ P,, = (I - R,,R|^), and P,, = R,„Rf^, V G Ifc 
Sib): Obtain a feasible solution Vij,(0) for all ik €2 

52) : For each BS k, compute C,,(i), E^Jt), V^^(t) and c,,(i), 

according to ( I24ab - (l24dl i. for all ik G Ifc 



S3a): W,,(0 = c,. W(R,.H^J^U,^. (t)(E,, (U,^ (t))«Hj^R,, + A*Im-l 

xc,,(t)R./^(HfJ«U,,(i)(E..(i))-\ V zfc ex, 
where is computed by a bisection procedure 
S3b): V.,(t)=R,,W.,(t) 
S4) Until some stopping criterion is met. 
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C. Linear Precoder Design for HetNet with Intra-Cell Full ComP and Inter-Cell CB 

Consider a HetNet setting, in which there are a set of Qk BSs in each cell, and they form a single virtual BS 
to transmit to the users. The objective is to design the virtual linear precoder for the users subject to the sum- 
power constraint for each individual BS. In this case, becomes = {V" : Y^i.^i^ Trl^^ (V^^')''^] < P'"'} . 
Assume for now that there is no penalty term s(V). Then the subproblem (ILower-Boundl) can be again decomposed 
into K independent subproblems of the form 



rnax ^ gi^(Vi^,\') 



(VIBC-SUB) 



ik€lk 



Differently from problem (IIBC-SUBI ) discussed in Section |IV-A[ the above problem has Qk = \Qk\ separable 
constraints (each constraining a subset of variables), hence Lagrangian multipliers {\1^}gk€Qk- The bisection 
algorithm on a single multiplier developed in Table |ll] thus does not work in this case. 

Fortunately, the constraints for this problem are separable among different block variables {y^''}q^^Qt: (that 
is, precoders belongs to different BSs). Therefore a natural way to obtain the optimal solution for the problem 
(IVIBC-SUBI ) is to use a block coordinate descent (BCD) algorithm (see ll64l . ll65l ). which updates one block 
variable V'?'- at a time while holding the remaining block variables fixed. To capitalize the block structure of the 
objective of problem dVIBC-SUBK the following definitions are needed. Let 



Partition J'^ (which is defined in (l22l )) and Si^ into the following form 

J^-[l,l], J'^[l,Qfc] 



(40) 



Sifc — 



^"k[^]r--Mk{Q^ 



H 



(41) 



where V[q,p\ G C^x^, V {q,p) e Qk x Qk, and Si,[q] G C^^^d.^^y ^ g function g,^{Vi^,V 



defined in (I2TI ) can be expressed as 

Pk&Qk Pk,qkeQk 



(42) 



Clearly, J^i^eXk dikO^ik^'^) again a quadratic function with respect to one particular block variable, say V™*. 
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It follows that the per-block problem, written in the following form, can be efficiently solved in closed form. 

max /(V'',V) (VIBC-BLK) 

s.t. /(v^v)= 5..(v.„v), 

Tr[V™''"(V™'=)'^] < 

Let A™*- > denote the Lagrangian multiplier associated with the power constraint of the mfc-th subproblem. 
Following the same derivation in Section ITV-AI the optimal solution (V™'- )* for the problem (IVIBC-BLKI ) can be 
expressed as 

C^Z'T - (j'K.,mfe] + (A'"'=)*Im)"' j S,,K] - J'K,Pfc]Vf^'= J , V 4 G (43) 

where the optimal multiplier can be computed again using a bisection search. 

In summary, in the presence of multiple BSs with individual power constraint, the proposed algorithm consists of 
the following two layers: i) the outer layer that updates Ejj^(t), Uj^(t), Ci^{t), i^{t) and S^{t); ii) the inner layer 
that updates each by a BCD algorithm with blocks given by {V^''}qkeQk- The overall algorithm is detailed in 
Table |IV1 

TABLE I V 

The Proposed Algorithm for Solving JSYSTEMb with intra-cell ComP and inter-cell CB 



51) : Initialization Obtain a feasible solution V^^. (0) for all ife e I 

52) : For each BS k, compute Ciji), E^Jt), \Ji^{t) and c.,Jt), 

according to ( I24ab - (l24db . for all ik G Ik 

53) : For each BS k, compute the following 

S3a): J^t) = E,,exc., W(H^,)^U,,(0(E,, W)-HU„(t))^H^, 
S3b): S.,,{t) = c,,(t)(H^;)«U,,(t)(E,,(t))-i, V e 1^. 

54) : For each BS k, compute the precoders V'^(t) by 

Repeat Cyclically pick rnfc G Qk 

Compute (V"")* using (Ell, V G Ifc 

where (A™*^)* is computed by a bisection procedure 

Until Desired stopping criterion is met 

Let VZHt) = (VT:r^y^k,mk 

55) Until some stopping criteria is met. 



Remark 4 The convergence of the BCD step S4) to the global optimal solution of the subproblem (iVIBC-BLKb 
can be shown using standard argument such as |[65l Theorem 4.1]. 

Remark 5 (Hybrid Implementation) Interestingly, the proposed framework further allows for the implementation 
of hybrid cooperation schemes, in which the cells can choose to serve the users by using either ZF based precoding 
or the general linear precoding. Moreover, certain cells can have only a single BS, while the rest can have multiple 
of them. Such hybrid implementation amounts to requiring each subproblem (ILower-Boundl ) take different forms 
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of constraints. As long as each subproblem can be solved to its global optimality, the convergence of the overall 
algorithm is always guaranteed. 

Remark 6 (Per-cell partial ComP): The algorithm proposed in this section can be easily extended to the case of 
per-cell partial ComP. 

Assume that the BS clustering structure is known and letting S'"' C Ij^ denote the set of users served by BS 
Qk, then we only need to slightly modify the proposed algorithm in Table HV] by the following: 

1) In SI), for each BS € Qk, set V^'=(0) = for all ^ 5™''; find a feasible solution for the rest of users 

2) In S4), let each BS m^. G compute V™"(t) using dHJl, V ifc G 5™" (instead of for all ik G W- 

In this way, only the precoders of the subset of users served by each BS will be updated in each iteration. Once 



again, the computation in each iteration admits a closed-form solution, while in related works such as 11531 . general 
purpose convex solvers need to be used for solving the subproblems 0. 

Moreover, when the BSs' clustering structure needs to be designed jointly with the precoders, we can include the 
penalty term s(V) into the objective to induce certain block-sparsity in the precoder V^*^. Except for this additional 
term in the objective, which leads to different solution to the associated subproblem, the algorithm for the joint BS 
clustering and precoder design problem is the same as the one in Table |IVl Specifically, the per-block subproblem 
(IVIBC-BLKI ) takes the following form 



max /(v^v)- st:{vt:) m 

ik &Ik 

S.t. /(V^V)= 5^.(V4,V), 

ik &Ik 

ik&lk 

In particular, when we let •5^'°(V™'") = ||V™'" ||, this subproblem becomes a well known quadratic group-LASSO 
problem |[66l (with an additional quadratic constraint), which can be solved using an iterative procedure. We refer 



the readers to 11541 for detailed algorithm | 

The algorithm proposed in Table HVl can be viewed as a double time-scale algorithm: the subproblem dVIBC-SUBI ) 
is solve in a relatively fast time-scale by a BCD iteration (i.e., step S4)), while the computation in S2)-S3) is 
performed less frequently in a slow time-scale. 

^Such clustering structure can be determined, for example, by using simple heuristic pathless model. A particular useful scheme for BS 
clustering is to serve the users by the sets of BSs that are adjacent to them, where the closeness of the BSs to the users are measured by 
the pathloss coefficients among the users and BSs. 

''The subproblem used in 1531 is derived from certain difference of convex function (d.c.) property of the weighted sum-rate objective, 
which is, of course, in a very different form from the one used in the present work. 

^Reference II54I solves a slightly simplified beamforming problem when A'^ = 1 and di^, — 1 for all ik € Xfe. That is, there is a single 
data stream for each user. Nevertheless, the algorithm therein can be extended easily to the more general case with A'' > 1 and di^ > 1. 
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One desirable feature offered by this separation in time-scale is that in each of its fast time scales, interactions 
are only limited within each cell: there is no coordination or message exchanges required among the BSs in 
different cells. On the other hand, however, to guarantee the convergence of the overall algorithm, the fast time- 
scale computation needs to be performed until the subproblem dVIBC-SUBI ) is solved exactly. Such requirement 
turns out to be quite inflexible for practical implementation. The main reasons are listed below: 

1) It is usually difficult to check whether the inner iteration has indeed reached the optimality; 

2) Before reaching the optimality for the subproblem (IVIBC-SUBI) . the marginal benefit of the precoder updates 
in the inner iteration decreases as the iteration progresses. This effect is manifested in particular in the first 
few outer iterations, in which even the inner problem is solved exactly, the precoders obtained are still far 
away from the optimal ones. 

Heuristically, one may resort to running a few, or even a single, BCD iterations for each cell in Step S4), 
but the convergence properties of such heuristics seem hard to establish. Fortunately, the universal lower bound 
established in Section |lll] allows us to develop a different version of the SCA algorithm, in which each subproblem 
dLower-Boundl ) is solved inexactly. The benefit of such algorithm is quite obvious from our preceding discussion. 
It allows one to solve the subproblems approximately at the beginning, and more accurately later as the iteration 
progresses. In the next section, we will present in detail the "inexact" version of the SCA algorithm, analyze its 
convergence properties, and demonstrate its possible application in the precoder design problem in the HetNet. 

V. An Extended Inexact SCA Approach 

In this section, we present an important extension of the SCA algorithm, which is a single time-scale algorithm 
without any inner iterations. Once again, such single time-scale implementation of the SCA algorithm hinges upon 
our ability to decompose the original problem (ISYSTEMb into a sequence of separable and convex subproblems 
in the form of dLower-Boundl ). The key difference of the inexact version of the SCA algorithm, compared with 
its original counterpart, is that in each of its iteration, a reasonably "good" step that improves the objective of 
problem dLower-Boundl) is taken to update the precoder V, instead of finding the one that solves the subproblem 
dLower-Boundl ) to its global optimality. 

For clarity of presentation, we will first introduce the algorithm in its general form and prove its convergence. 
As an example, we will then specialize it for the HetNet linear transceiver design problem studied in Section ITV-CI 
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A. The Inexact-SCA Algorithm 

For simplicity of notation, let us introduce the following definitions: 



ik elk 

i/jGlfc Qk&Qk 



n'=(V^ V) ^ /(V^ V) - s'^(V 



(45a) 
(45b) 
(45c) 



Using these definitions, the overall system lower bound /i(V; V) — s(V) can be expressed as 

MV;V)-s(V) = E^'(V';V). (46) 



The proposed inexact algorithm consists of the following main steps. Fig. |4] gives a graphical illustration of the 
proposed algorithm. 



a(2)D(2)«(3)Z)(3) 




V(0)V(1) F(2) F(3) 



Fig. 4. A graphical illustration of the proposed Inexact-SCA algorithm, assuming s(V) is not present. At V(0), a concave function 
/!,(V; V(0)) is used to approximate the original non-convex function /(V). A good step that increases the objective value is taken to update 
the solution V(l). Then the concave function /t(V; V(l)) is constructed at the point V(l), and so on. Continuing this process, a stationary 
solution of the original non-convex problem can be found. 



Step 1 (Find update direction): Suppose \{t — 1) is a feasible solution to (ISYSTEMb . At iteration t, we solve 
the following convex optimization problem for each cell k G K, 



1. 



max g^k V'=(t - 1); V(i - 1) + -Tr[(D^-)^ G"(V(t - 1))D'^] - s'=(D'= + V'=(t - 1)) (Q) 

s.t. D'=+v^(t- 1) e v^ 

D5'= + V'?'=(t- 1) € V''^ V Qk G Qk 

where -G'^(V(t - 1)) >- 0; Tr[(D'^)^G'^(V(i - 1))D'^] is an approximation of the second order directional 
derivative of g^{-;Y{t — 1)) at point V(t — 1). Let D'^(t) denote the optimal solution of the subproblem ( [Q] ). 
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Step 2 (Armijo step-szie selection): For each cell k, choose the constants a G (0, 1), a™^* > 0, (5^ € (0, 1). Let 
a^{t) be the largest element in {Q!™^*/3-'}j=o,i,- - satisfying: 

{Y^{t - l) + a^{t)T>^(t);\{t - 1)) > ii'=(V'=(t - 1);V(< - 1)) 

+ aa\t) {Y\t - 1); V(t - 1)) - s\M\t - 1) + D^') + s\Y''{t - 1))) . (48) 

Step 3 (Update precoder): Let \^{t) = W^{t-l) + a^{t)Ti''{t), V A; G /C. 

Step 4 (Update lower bound): For each user ik € X, compute (I24al )- (l24dl ). Compute the updated lower bound 
function /i(V; V(t)) according to (|2TI ). 
Step 5: Let t = t + 1, go to Step 1. 

Compared with the exact version of the SCA algorithm, at each iteration, the subproblem ([Q]) needs to be solved 
instead of the subproblem (ILower-Boundl ). One may wonder why this may be an easier task, as the new subproblem 
also appears to be a quadratic problem with many constraints. In fact, the flexibility provided here is the freedom 
to choose the matrix G'^'(V). We will see shortly that as long as this matrix is chosen to be negative definite, the 
convergence of the algorithm is always guaranteed. This allows us to choose the matrix in a way that can further 
decompose the problem (|Q]| into | Qk \ subproblems (one for each BS qu), each of which is potentially easy to solve. 
Of course, how this can be done is highly problem dependent, hence this issue will be discussed later with the 
applications. 

Before analyzing the convergence of the proposed algorithm, we present two technical lemmas, the proofs for 
which can be found in Appendix iBl-ICl The first lemma bounds the improvement of vf'{-\Y) before and after the 
precoder has been updated (hence the improvement of the lower bound V) — s(-), cf. (l46l)). The second lemma 
shows that if V(t) = V(i — 1), then a stationary solution of problem (ISYSTEMI ) is reached. 

Lemma 2 Suppose the second order directional derivative of the concave function ; is lower bounded, 

that is, for all feasible V*^, V and all feasible direction Y)^, there exists a constant > 

g^u' (v^v) > -BHv[{T>^)^Ti% (49) 

Then the following inequality is true for any q > 

vl" ( V'^ + aD^ V) - vl' (V^ ; V) 

> a^D'-' (V^; V) - Tr[D'=(D^')^] - a (v'^ + D^') - s'' (v'^)) (50) 

Lemma 3 If for a given precoder V(t — 1), the optimal solution for the subproblem ([Q]) is Y)^{t) = for each 
A; € /C. Then V(t — 1) is a stationary solution for the sum-utility maximization problem (ISYSTEMI ). 

We then proceed to analyze the convergence property of the proposed algorithm. We leave the proof details of 
the following result to Appendix |Dl 
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Theorem 2 Assume the second order directional derivative of ; V j is lower hounded as in ( I49I ). Further 

assume that — G'^(V) >~ Ofor all feasible V and for all k, and that G'^(V) is continuous in V. Then the inexact-SCA 
algorithm converges to a stationary solution of the problem (ISYSTEMI) . 

Remark 7 In the literature, algorithms related to the Inexact-SCA include the block successive convex approxima- 
tion (BSCA) algorithm recently proposed in [62] and the coordinate gradient descent (CGD) method proposed in 
ll67l (also see |[64l ). The BSCA (resp. CGD) computes a stationary solution of a non-convex problem with smooth 
objective function /(•) (resp. smooth plus separable nonsmooth function /(•) + s{-)) by solving a sequences of 
convex approximated subproblems. Like the inexact-SCA algorithm, the update direction is computed by solving 
a strictly convex problem, while the step-size of the update is determined via Armijo rule. 

The inexact-SCA and the above mentioned two methods and their convergence results differ in several important 
places. On the one hand, the BSCA and CGD seem to be more general in that the variables can be updated in a 
block-by-block fashion. However, the inexact-SCA finds a good direction to improve the lower bound h{-) — s(-) 
of the original objective, while both the BSCA and CGD try to improve the original objective directly. In the 
present application, the approach adopted by the inexact-SCA is more favorable. Thanks to the separability of the 
lower bound h{-) — s(-), the improvement of its functional value before and after the update can be checked easily 
by each cell k (via checking g^{-) — s'^(-), cf. (|48])). The same cannot be done if we were to check the original 
objective function. Moreover, if we were to adopt either BSCA or the CGD algorithm, it is not immediately clear 
how the subproblem for computing the update direction can be formulated. 

B. Application to Linear Precoder Design in HetNet 

Now that we have seen the proposed inexact-SCA algorithm and its convergence property, we proceed to 
demonstrate how it can be effectively utilized in precoder design problems. Let us take the linear precoder design 
problem discussed in Section IIV-CI as an example. 

First of all, let us compute the first and second order directional derivative for g^{Y'';'V) at the point V'^. Using 
the definition of Sj^ and j'^ given in (l22l ) and ( [401 ). we have 



) 



E E (Tr[{S,.[ftl- E J'fe.PJVCXDg) 



H 



qk&Qk'ik&Ik \ Pk&Qk 





27 



Let us choose the matrix G*^ as —2 x bkdlg(j'^), that is 

-2J'=[1,1], 



Qfc ^ 








-2J^[0fe,Q, 



(51) 



Obviously this matrix is continuous in V, as the matrix J*^, defined in (l22l) . is continuous in V. 

Note that other choices of G''" are also possible (e.g., let G'^ = —I). However in practice choosing G'^ in the 
form of (ISTI ) can effectively accelerate the convergence of the overall algorithm, as it represents an approximated 
version of the second directional derivative of ^('^(V'^; V). 

As there is no penalty term present, the objective of the subproblem (|Q]) at a point V can be written as 

Pk&Qk 



+ Tr 



For notational simplicity, let us define 








(|^A/|Ifc|xM|X,.| 



(52) 



(53) 



••• -2J'=[gfe,gfe] ^ 

When the feasible sets are characterized by the per-BS power constraints: V^* = {V?*" : Tr[V'^'^(V'^'^)^] < P'^''}, 
the subproblem ([Q]l can be again decomposed into \Qk\ subproblems, one for each BS qk'- 



max Tr 

Y'fc 



s.t. Tr[Y^'=(Y'?'^)^] < 



(VIBC-I) 



where we have made the transformation Y'^'= = V'^'' + D'^''. This problem is again a quadratic problem with a 
single ball constraint. Let X'"' > denote the Lagrangian multiplier associated with the power constraint. Using 
the same argument leading to (|33] ). we can show that the optimal solution for the subproblem (IVIBC-II ) is 



(54) 



or equivalently. 



{Y^'J = (23'^[q,,qk] + {X'"'riM) (Kilk] - E 9fc]Vf^^ ) , V S Zfe. (55) 

Pk&Qk\qk 
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It follows that the optimal direction for BS to update its precoder is D'?'" = Y'^'' — 'V^''. After each BSs 
finishes the computation of the direction matrices {D*}g^gg^, the macro-BS can collect them via the backhaul 
link, and compute the stepsize using the Armijo rule. In this case, the Armijo rule (1481) is simplified as selecting 
the largest a'' in {a™'*/3-'}j=o,i, - , such that 

/(V'= + a'=D^V)-/(V^V) >cTa'=(5^.' (v^v)) (56) 

The proposed inexact-SCA algorithm for the precoder design in HetNet with intra-cell ComP and inter-cell CB 
is detailed in Table |Vl It is easily seen that the condition specified in the statement of Theorem |2] is satisfied, that 



5d-" (v^ v) = -2Tr[(D*^)^J*^D'=] > -2p(j'=)Tr[(D^)^D*^], (57) 

with p{3^) being upper bounded for any feasible V € V. Then the convergence of the algorithm given in Table IVl 
is a straightforward consequence of Theorem |2] 

TABLE V 

The Inexact-SCA Algorithm for Solving jSYSTEMl i with intra-cell ComP and inter-cell CB 



51) : Initialization Obtain a feasible solution Vij.(0) for all ik G I 

52) : For each BS k, compute C^Jt), Bi^it), lJ^^{t) and Ci^{t), 

according to (I24at -( l24dl i. for all ik G Ik 

53) : For each BS k, compute the following 

S3a): J'^it) = W(H^J^U,, (i)(E,,(0)-HU,,(t))^H^^, 

S3b): S.Jt) =c,,(i)(Ht)^U.,(i)(E,Jt))-i, V G Xfe. 

54) : For each cell k, compute the precoders V'^(i) by the following steps: 

Compute Yi''{t) using dUi, V qk G Qk, 
Let D«'= (i) = Y"^' (t) - V"^' {t - 1), V gfe G Qk; 
Perform the Armijo line search (|57] | to determine a'^(t), 
Let V- {t+l)= Vi" {t) + a'=(i)D«'= (<), y qk € Qk 

55) Until some stopping criterion is met. 



VI. Numerical Results 

In this section we conduct numerical experiments to validate the effectiveness of the proposed algorithms. Both 
the exact and inexact versions of the SCA algorithm are tested for three main settings: 1) Multicell downlink linear 
precoder design (i.e., the IBC model); 2) HetNet downlink linear precoder design with inter-cell CB and intra-cell 
JP; 3) HetNet downlink joint clustering and linear precoder design with intra-cell partial ComP. 

The general setup for the experiments are given as follows. We consider a multicell network of up to 10 cells. The 
distance of the centers of two adjacent cells is set to be 500 meters (representing a HetNet with densely deployed 
cells. See Fig. |5]for an illustration). Both the BSs and the users are randomly placed in each cell. Let y^^* denote 
the distance between BS qg and user i^. The channel coefficients between user ik and BS qg are modeled as zero 
mean circularly symmetric complex Gaussian vector with (200 / y^^^ ) ^ Lf^ as variance for both real and imaginary 
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dimensions, where 10 log lO{LfJ ~ A/'(0,64) is a real Gaussian random variable modeling the shadowing effect. 
We set the noise power af^ = 1 for all ik, set the power budget Pq^ = P for all q^. We define the total transmission 
power for cell k as P^°* = P\Qk\- 

The stopping criteria are chosen as follows. The single time-scale algorithm as well as the outer loop of the 
double time-scale algorithm stop when ^^^^~^ \ u(t)^^*^^ — 10 The inner loop of the two-time scale algorithm stops 
while the relative increase of the objective value for the related subproblem (i.e., problem (IVIBC-BLKI l) is less 
than 10"'^ after performing one round of update by all the BSs in the cell. 




[Meters] Number of Cells (K) 

Fig. 5. Cell configuration for numerical experiments. Fig. 6. Comparison of the system throughput of different al- 

gorithms in HetNet setting with different sizes of the network. 

K = [2,4,6,8, 10], Pfe°' = 30dB, \Qk\ = 6, \lk\ = 10, M = 5, 
A'' = 3, dij. = 1 or dif, — 3 for all ik G T. 



A. HetNet and Multicell Downlink Setting 

In this section, the performance of the following algorithms will be demonstrated and compared: 

1) WMMSE Algorithm P4l : This algorithm is the one described in Table JI] As this algorithm in its original 
form cannot deal with the per-BS power constraint in the HetNet setting, we will also consider its simple 
extension, in which the WMMSE algorithm is performed followed by a power normalization step for each 
BS qk G Qk- The latter algorithm is abbreviated as "WMMSE-N"; 

2) SCA-IN for IBC: This algorithm is a simpUfication of that described in Table |V] for the IBC setting. It solves 
the same problem as the WMMSE algorithm; 

3) SCA for HetNet: This algorithm is the two time-scale algorithm described in Table |IVl 

4) SCA-IN for HetNet: This algorithm is the inexact algorithm described in Table |Vl 

5) ZF-SCA for IBC: This algorithm is the intra-cell ZF plus inter-cell CB algorithm described in Table Hill 

6) Per-Cell ZF for HetNet (21]: This algorithm is Algorithm 2 proposed in ||2T1. It performs the intra-cell ZF 
for the HetNet setting with BS power constraint, while completely ignoring the intra-cell interference. 

All the algorithms considered in this subsection use the system sum rate as the users utility function. The plots to 
be shown represent the averaged performance of the algorithms running over 100 independent network generations. 
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Number of Cells (K) 



Fig. 7. Comparison of the CPU Time required for computa- 
tion in HetNet setting with different sizes of the network. K — 
[2, 4, 6, 8, 10], Pfe°' = 30dB, |Qfc| = 6, IJfcl = 10, M = 5, N = 3, 
dif, = 1 for all ife £ T. 

1100i 
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Fig. 9. Comparison of the system throughput of different algorithms 
in HetNet setting with different levels of transmit powers. K — 5, 
Pf' = [0, 10, 20, 30, 40, 50]dB, \Qk\ = 6, \Tk\ ^ 10, M = 5, 
A'^ = 3, dif. = 1 or di^ — 3 for all it £ T- 
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Fig. 8. Comparison of the CPU Time required for computa- 
tion in HetNet setting with different sizes of the network. K = 
[2, 4, 6, 8, 10], ' = 30dB, |Qfc| = 6, jJfcl = 10, M = 5, N ^ 3, 
di^ — 3 for all ik G X. 
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Fig. 10. Comparison of the CPU time needed for computation 
of different algorithms in HetNet setting with different levels of 
transmit powers. = 5, Pfc°* = [0, 10, 20, 30, 40, 50]dB, |Qfc| = 
6, |Xfe| = 10, M = 5, iV = 3, di^ = 1 for all ik G X. 



Our first set of experiments compare the performance of the first four algorithms listed above. In Fig. |6]-Fig. [8j 
the averaged system sum rate achieved by different algorithms as well as the averaged CPU time used is compared 
for a network with = 6, M = 5, = 3 and |Xfcj=10. In Fig. |9]-Fig. [TT] a similar set of experiment is 
performed for networks with different SNR values. For the WMMSE and the "SCA-IN IBC" algorithm, the per-BS 
power constraint is completely ignored. Instead, a single per-cell power budget is assumed. Several interesting 
observations can be made. First of all the WMMSE algorithm and the SCA-IN algorithm in the IBC setting have 
almost identical performance in terms of both achieved sum rate and the computational time required. Secondly, in 
the HetNet setting the SCA-IN algorithm is much more efficient than the SCA algorithm. Remarkably, it uses the 
same computational resource as the WMMSE algorithm, while at the same time being able to enforce multiple per- 
BS power constraints. Additionally, the heuristic algorithm that combines the WMMSE with a simple normalization 
leads to poor sum rate performance. 
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Fig. 11. Comparison of the CPU time needed for computation 
of different algorithms in HetNet setting with different levels of 
transmit powers, if = 5, Pfe°* = [0, 10, 20, 30, 40, 50]dB, \Qk\ ^ 
6, \Ik \ = 10, M ^5, N = 3, d,^ = 3 for all £ T. 




40 50 
p'°'(dB) 



Fig. 12. Comparison of system sum rate achieved for different 
algorithms with different levels of transmit powers. K = P^°^ = 
[0, 10, 20, 30, 40, 50]dB, \Qk\ = 5, \Ik\ = 10, M = 4, iV = 2, 



di. 



2 for all ife G I. 



The second set of experiments compare the performance of the schemes that utiUze general Unear precoding 
and the ZF precoding. The results are summarized in Fig. [T2]-Fig. [T4l In Fig. [T2j we show the performance of 
the algorithms in the network with 5 cells and jQ^j = 5, \If^\ = 10, M = A, N = 2 and = 2. Note that 
we have chosen the network parameters such that feasibility condition for ZF precoding is exactly satisfied, that 
is, M|Qfc| = N\Ii^\. This is to say that for the ZF-based schemes, all the resources are dedicated to eliminating 
intra-cell interference, while the inter-cell interference is ignored. However, as suggested in Fig. [T2j this is not an 
ideal strategy to deal with interference in densely deployed HetNet. The reason is that when the cells and the BSs 
are densely deployed, inter-cell interference is equally detrimental as the intra-cell interference on the users' rates. 
The general linear precoding approach, without pre-specifying which interference is to be cancelled, appears to be 
a more balanced way of dealing with the interference. This phenomenon is further highlighted in Fig. [13] In this 
figure, when approaches the maximum number of allowable users for which the ZF strategy is still feasible 
(12 in this case, as ^^j^*'^ = 12), the performance of both ZF based schemes drop sharply. On the other hand, 
when there is enough number of transmitters (or the BSs) in each cell, the ZF based-schemes perform as good as 
the general linear precoding based schemes. 



B. Partial ComP in HetNet 

In this section, we aim at jointly designing the clustering and linear precoding schemes in a partial ComP setting. 
To induce the desired clustering structure, we specialize the penalty terms in the objective to take the following 
form ll54l : sf^CV) = A||V?^''||2, V ik,qk where A > is chosen appropriately to balance the resulting group size 
and the throughput performance. We compare the performance of the following three algorithms: 

1) WMMSE Algorithm: This is our baseline algorithm that optimizes the precoders by treating all the BSs in 
each cell as a single virtual BS. The clustering structure is not optimized. 
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Fig. 13. Comparison of system sum rate achiieved for different 
algorithms witfi different number of users per cell. K — 5, P^"* = 
30dB, IQfel = 6, |Jfc| = [2,4, 5, ■ ■ ■ , 12], M = 4, iV = 2, d,^ = 2 
for all ifc e X. 
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Fig. 14. Comparison of system sum rate achieved for different 
algorithms with different number of cells. K — [1, 2, • ■ • ,5], 
Pl°' = 30dB, IQfcl = [5, 10], |Jfc| = 10, M = 4, TV = 2, di, = 2 
for all ik € T. 



2) SCA Algorithm: This is the algorithm proposed in 11541 . which can be viewed as a special case of the SCA 
algorithm for solving the penalized utility maximization problem, see Remark |6l 

3) SCA-IN Algorithm: This algorithm uses the inexact-SCA approach to solve the penalized utility maximization 
problem. 

Our result is summarized in Fig. [TSl - Fig. [16] as well as in Table |Vll We see that both the SCA and the SCA-IN 
based approach is able to keep a large portion of the system sum rate achieved by the full per-cell cooperation, 
while using only small cluster sizes. In contrast, the precoders generated by the WMMSE algorithm are void of 
any kind of clustering structure: they always mandate all the BSs to transmit to each user. Comparing the SCA 
and SCA-IN algorithm, we see that their performance is almost identical in terms of system throughput and the 
generated cluster sizes. However the advantage of using the inexact version is that it is computationally much more 
efficient than its exact counterpart. 
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Fig. 15. Comparison of the system throughput of different 
algorithms in HetNet setting with different number of cells. K — 
[1,2,- •■ ,10], Pr = 30dB, \Qk\ = 10, \Ik\ = [10,20], M = 4, 
N ^ 2, di^ = 1 for all ik £ X. A = 0.1 when \Ik\ = 10, and 
A = 0.05 when \Ik \ = 20. 




Fig. 16. Comparison of the averaged cluster size generated by 
different algorithms in HetNet setting with different number of cells. 
K = [1,2,-- - ,10], = 30dB, \Qk\ = 10, [Xfcl = [10,20], 
M = 4, iV = 2, d,^ = 1 for all ik £ I. \ = 0.1 when \Ik\ = 10, 
and A = 0.05 when \Ik \ = 20. 
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TABLE VI 

CPU Time Needed for Different Algorithms (Unit: Second) 





K=l 


K=2 


K=3 


K=4 


K=5 


K=6 


K=7 


K=8 


K=9 


K=10 


WMMSE {Ik = 10) 


0.8 


2.5 


3.1 


5.2 


6.6 


8.2 


10.1 


11.2 


14.4 


17.0 


SCA (Ik = 10) 


6.0 


9.9 


13.2 


17.1 


19.8 


24.7 


28.2 


33.7 


39.8 


42.5 


SCA-IN (h = 10) 


0.8 


1.7 


2.7 


3.4 


4.4 


6.0 


6.9 


8.5 


9.8 


11.8 


WMMSE (h = 20) 


2.6 


6.5 


11.0 


15.6 


21.7 


28.1 


36.0 


44.5 


52.4 


62.5 


SCA (Ik = 20) 


20.6 


28.3 


34.4 


45.5 


57.1 


70.8 


80.9 


87.4 


106.7 


121.3 


SCA-IN (h = 20) 


2.6 


4.6 


6.1 


8.9 


11.5 


14.4 


18.4 


21.2 


25.6 


30.8 



VII. Conclusion 

In this paper we have addressed an important family of interference management problems arising in the 
heterogenous networks. The main novelty of this work lies in the proposal to achieve decomposition across the 
interference-coupled networks by using the technique of successive convex approximation. Our proposed approach 
is of low computational complexity, as each of the subproblems to be solved is convex and completely decomposes 
across the cells. Depending on the way that the subproblems are solved, two general algorithms have been proposed, 
both of which can be applied to many practical interference management problems. We believe that the framework 
studied in this paper is extendable to many other important problems for resource/interference management beyond 
those mentioned in this work. 
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Appendix 

A. An Alternative Proof of Lemma [7] 

Proof: Note that we have Cj^^ >- 0. Thus in order to show that /^(Vj^, Cj^) is jointly convex with respect to 
(V4, C4) in their respective domains, it is sufficient to show that for all G (j^A/Q^xd,^^ y^,^ ^ gTV ^j^^ ^ such 
that C + tMj^ >- 0, the second order directional derivative of Zj^^ ( Vj^ , C^. ) is non-negative II6OI 
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Let us first investigate the first order directional derivative. We have that 

dt 



dli 



dt 



Tr 



+ Tr [(E,J-i(V,, +tD,J^(H*^)«(C,, +tM,J-iH^^D,,j 

-Tr[(aj-^(V., +tD.J«(H^J^(C., +iM.J-iM.,(C., +tM.J-iHfjV,, +tD.J 
a{t) + b{t) + c{t) 



(59) 



For each of the above three terms, we take a further order of derivative with respect to t. For the first term, we 
have: 



da{t) 
dt 



Tr 



(E,J-i(D,J«(Ht)^(C,,+tM 



Tr 



(60) 



Similarly, for the second term we have 



db{t) 
dt 



Tr 



(E,J-i(D,J«(HM^(C,,+tM 



Tr 



(E,J-i(V,, +tD,J^(Ht)^(Q, +iM,J-iM,,(C,, +tM,J-iH^D,, 



(61) 



For the third term, we have 

dc{t) 



dt 



Tr 



(E,J-^(D,J^(Hn^(C,, +tM,J-^M,,(C,, +tM,J-^m (V,;, +^D,J 



+ Tr 
- 2Tr 



(E,J-i(V,, +tD,J^(Hfj^(C,, +tM,J-iM,,(C,, +tM,J-iHf^D,, 



(aj-^(V,, +tD,J^(Hn^(C., +tM,J-iM,,(C., +iM,J-iM,,(C,, + tM.J-iH,*; (V,, + iD,J 



(62) 



For notational simplicity, let us define 



(63) 



Note that we can write (C^ + tMj^^) because of the assumption, (Cj^ + tM^) >- 0. 

Utilizing (I60l)-(l62l). we have the following expression for the second order derivative of li^^ (V^ + fDi^ , + 
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_ da{t) dh{t) dc{t) 
dt dt dt 

= 2Tr [Eri(A,, - B,J^(A,, - B,J] > 



(64) 



where the last inequaUty is from the fact that (A^ — Bjj^)^(A4 — Bj^) >z 0, Ej^. >- 0, and the fact that the trace 
of the product of two positive semi-definite matrices is nonnegative. This completes the proof. ■ 

B. Proof of Lemma |2] 

Proof: We have the following series of inequalities 



n'^(V^ + aD^ V) - u'=(V^ V) 



(ii) ' / - 



2 



2 



Tr[D''(D 



k/-r\k\Hi 



as'' ( + dM + (1 - a)s'^ ( VM - sM V 



Tr[D^'(D'')^] - a ( V'' + DM - sM V 



/ "17"fc I T\k \ k I -trk 



where (i) is from the well known Descent Lemma 1681 Proposition A. 32], (ii) is from the convexity of s''{-), which 
is implied by the convexity of Si^(-) assumed in assumption B-1. ■ 



C. Proof of Lemma\3\ 

Proof: The fact that the optimal solution problem ([Q]) is D'^(t) = implies that, for all feasible D'^, we have 
0>(7d.' (v'=(t-l);V(t-l)) + ^Tr [(D'=)-f^G'=(V(t - 1))D'^] - s*^ (v^(t - 1) + D'=) + (v'=(t - 1) 



(65) 



Clearly for any e e (0, 1), D'^ = eD'^ + (1 — e)0 is also a feasible direction. Then for all e € (0, 1), we have 

2 

> 5,D-' (v^(t - 1); V(t - 1)) + i-Tr [(D'=)-f^G'=(V(t - 1))D'=] - s'' (v{t - 1) + eD'^) + s'' (v*^(t - 1) 
Using the concavity of the function g''{-;Y{t — 1)), we obtain 



> g'' {V'^it - 1) + eD^ V(t - 1)) - g'' (V'^(t - 1); V(f - 1)) 



Tr [(D'=)^G'^(V(t - 1))D''] - s'' {V''{t - 1) + eD*-') + (V'=(i - 1)). 
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Dividing both sides of the above inequaUty by e > 0, and letting e goes to zero, we obtain, for all feasible D'^, 



(66) 



V^(i-l);V(t-l) -4. ^^(t-l) <0 



An immediate consequence of this result is that for all feasible directions 

hj, (V(t - 1); V(t - 1)) - sj, (V(t -1)) = Y^ 5d^' (V\t - 1); V(t - 1)) - 4.' (v'=(t - 1)) < 0. (67) 



fce/c 



Utilizing we obtain that for all feasible directions D, /j, (V(t - 1)) -s^ (V(t - 1)) < 0, which says V(t- 1) 
is a stationary solution of problem (ISYSTEMl) . 



D. Proof of Theorem |2] 



Proof: We first show that D (t) is in essence an "ascent direction" of the function u (V , V(t — 1)), that is, 
we have 

a^-it) {^Ht - 1); V(t - 1)) - (v'^ (i - 1) + ^Ht)) + s^' {^^t - 1)) > 0, V D^(t) / O. (68) 

Clearly, D'^ = is a feasible solution for problem (|Q]|, which yields an objective value of —s''(V^{t — 1)). From 
the optimality of the solution T)^{t), it follows that 

g^^it) (yHt-l);Y{t-l))+lTr[{U''{t)fG'{V{t-l))n'{t)] 



sk fY^(^t - 1) + T>Ht)) + s'' (V'it - 1) ) > 0. 



This inequality combined with the strict negative definiteness of the matrix G''(V(t — 1)) further implies that 



[y'it - 1); V(t - 1)) - s'^ (V^(t - 1) + B'^it)) + [Y\t - 1) 

1 



> — Tr 
2 



(D'=(t))^G^(V(t- l))D'^(t) 



> — Tr 

- 2 



D'=(t))-^D^(t) > 0, V D*^(t) / 0. 



(69) 



where > is a constant such that -G''(V{t - 1)) >z bkl- 

This shows (|68] ). When the Armijo rule (1481 ) is used for step-size selection, the inequality (l68l ) implies that 



u'=(V'=(t); V(t - 1)) - u''{V^{t - 1); V(t - 1)) 

> aa^t) Cv'it - l);Y{t - 1)) - s'' (v'^(t - 1) + n'{t)) + (v'=(t -!)))> 0. (70) 
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Using the lower bound property ( |20a| )-( |20bl ). we obtain (of. the derivation in ( [25] l) 

n(V(t)) = /(V(t)) - s(V(t)) > n'{YHt);V{t - 1)) 

> u'^iY'^it - 1); V(t - 1)) = u(V{t - 1)). (71) 

The above series of inequalities show that {u(V(t))} is a monotonically increasing and convergent sequence. This 
fact combined with (ITOl) implies that 

lim aa^t) (g^.u. (v^(t - 1); V(t - 1)) - (v'{t - 1) + D'^Ct)) + (v^(t - 1))) = 0. (72) 



We then claim that in the limit, the direction D'^(t) converges to 0. Assume the contrary, then there must exist 
a constant 6 > such that liminf(_s.oo ||D^(t)|| = 6. Then there must exist an infinite subsequence {tr}^i such 
that lirtir-^-oo l|D'^(tr)|| = 6 > 0. Then we show that along such subsequence, the stepsize a'^iU) is lower bounded, 
that is 

3c'' > s.t. < c^^ < a''{tr) < 1, V r. (73) 

Let us suppose that at iteration tr, the £-th trial of the line search is successful. Then according to the Armijo rule, 
in the {£ — l)-th trial, we must have 

^fc(V^(t^ - 1) + a'"it/3^-iD^(t,); V(t, - 1)) < u''(V''{tr - 1); V(t, - 1)) 

+ aa"'''p'-' (<7^.(,^)' {v'^iU - l);\{tr - 1)) - / (^^(t, - 1) + D'=(t,)) + [vHtr - I] 

which is equivalent to 

u''{V''{tr - 1) + a''itr)(3-^'D''{tr); ViU - 1)) < u''{V''{tr - 1); V{tr - 1)) 

+ aaHtr)r' {g^^iu) {^Htr - 1); V(t, - 1)) - / (v'^iU - 1) + D'^Ct,)) + / (V'^itr - 1] 

From Lemma |2l we can further obtain 

a'iU)g^.' (V^(< - 1); V^(i - 1)) - i^!M!:^Tr[D'=(i,)(D'=(t,))^] - {s'^{V'^{U - 1) + D^^^r)) - .s'=(V''Xi. - 1))) 



{tr)r^ (g'hHtr) - 1); V(t. - 1)) - S\\''{tr - 1) + D'=(t,)) + s\\\tr " 1))) 



Rearranging terms, and utilizing ( 1691 ). we obtain 

2/3(1 -a) te.(i^)' (V'=(t.-l);V(t,-l)) (V'=(t^ _ 1) + Dfc(t,)) + s'^ {^HU-l] 



SfcTr[(D'=(t))^^D'=(t)] 



2/3(1 - aW 

> ^ ^ > (74) 
where the last inequality is from the fact that B'' > 0, 6'^ > 0, and a € (0, 1). 
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Taking a further subsequence of {tr} if necessary, let {V{tr)} be a converging subsequence, with limit point 
V*. From the Armijo step-size selection rule and the series of inequalities in (TtTI ). we must have 

u{V{tr)) - u{V{tr - 1)) 

> (^^'(V'=(ir); V(t, - 1)) - u'' (V'^itr - 1); V(t, - 1))) 

> ^c^Htr) (^^.(j,)' (v'=(t. - 1); v(t,, - 1)) - s'' (v'^iu - 1) + n'^itr)) + / (v'=(t, - 1; 

>5^c7a^(i,)yTr[(D^(t,))^D*^(t,)" . 
The fact that {u(V(t))} converges implies that 



lim y aa^{tr)b^Tr \ (D'' (tr))" B'' {t 

r— >oo ^— ' L 



fcG/C 

This is only possible when limr^oo Tr [(D'^(tr))^D'^(tr)] = 0, which contradicts to the assumption of limr_j.oo D''"(tr) > 
0. As a result, we conclude that limj_!.oo D'^(t) = 0. 

Next we claim that every subsequence of {V(t)} converges to a stationary solution of problem (ISYSTEMj ). Let 
{V{t£)} be a converging subsequence, with limit point V*. As D'^(t£ + 1) is the solution for the convex problem 
(|Q]|, then for all feasible 



1 



9^Ht.+i) (V^(*^); V(t,)j + -Tr[(D'=(t, + l))"G'm,))n'{te + 1)] - [V'it,) + D'=(t, + 1) 

1 

+ -Tr 
2 



> ff^/ {v\ti);Vit,)) + ^Tr [(D'=)^G'=(V(t,))D'=] - (V^lt,) + D 



(75) 



Taking limit on both sides, by the assumed continuity of the directional derivative of g'^i-) and the continuity of 
G'^(V), as well as the fact limj_j.oo D^(t) = 0, we obtain that for all feasible 



0>g^>. ((V'^)*;V* +-Tr (D'^)^G^(V*)D'= - sM (V'^)* + dM + sM (V'=)* 



(76) 

Following the same proof as in Lemma [3l we can show that for all feasible directions D, /j^ (V*) — Sj-, (V*) < 0, 
which says V* is a stationary solution of problem (ISYSTEMI ). ■ 
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